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Arabinogalactan-proteins (AGPs) are complex glycoconjugates that are commonly found at 
the cell surface and in secretions of plants. Their location and diversity of structures have 
made them attractive targets as modulators of plant development but definitive proof of 
their direct role(s) in biological processes remains elusive. Here we overview the current 
state of knowledge on AGPs, identify key challenges impeding progress in the field and 
propose approaches using modern bioinformatic, (bio)chemical, cell biological, molecular 
and genetic techniques that could be applied to redress these gaps in our knowledge. 
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INTRODUCTION 

Arabinogalactan-proteins (AGPs), ubiquitous cell surface pro- 
teoglycans in both terrestrial and aquatic plants (and algae), 
are proposed to play essential roles in a range of plant growth 
and development processes, including cell expansion, cell divi- 
sion, reproductive development, somatic embryogenesis, xylem 
differentiation, abiotic stress responses, and hormone signaling 
pathways (Seifert and Roberts, 2007; Ellis et al., 2010). These roles 
emerge from largely indirect evidence and from the "recognition" 
potential arising from the incredible diversity of their glycan and 
protein backbone moieties as well as their location; attached to the 
outer leaflet of the plasma membrane by a glycosylphosphatidyli- 
nositol (GPI) anchor and in some instances cross-linked into the 
wall. Despite intense research to unravel AGP function, their 
molecular mechanism(s) of action remain elusive. AGPs exhibit 
complexity at many levels: First, AGP protein backbone genes 
are part of large gene families, and this makes the study of AGP 
function through characterization of single AGP mutants a major 
challenge due to gene redundancy (Ma and Zhao, 2010; Showalter 
etal, 2010). Second, AGP protein backbones are highly glyco- 
sylated, hindering production of antibodies specifically directed 
to the protein moiety that would allow for identification and 
isolation of single AGPs. Third, the variety of monosaccharides 
present in AGP carbohydrate moieties, the variety of linkages 
between these monosaccharides and the special arrangement of 
these linkages provides high heterogeneity and complexity to the 
AGP carbohydrates. These properties make purification of indi- 
vidual AGPs difficult and expression of properly glycosylated AGPs 



in heterologous systems problematic. Consequently, functional 
evaluation of specific AGPs is not trivial. Incredible advances in 
cell and molecular biology, omics and computational sciences, 
however, offer the potential to unlock the mysteries of AGP struc- 
ture and function if harnessed in a synergistic manner. In this 
overview we briefly review the AGP field and identify and explain 
the key research challenges. 

STRUCTURES 

Arabinogalactan-proteins belong to the hydroxyproline-rich 
superfamily of glycoproteins (Schultz etal, 2002; Johnson etal, 
2003) being composed largely of carbohydrate (90-98% w/w) 
with some protein typically rich in the amino acids, Hyp/Pro, 
Ala, Ser/Thr, that is usually covalently modified with a GPI anchor 
at the C-terminus (see Figure 1). Historically, AGPs were defined 
if they met three criteria: the presence of arabinogalactan chains, 
a Hyp-rich protein backbone, and the ability to bind to a class 
of synthetic phenylazo dyes, the f3-glucosyl Yariv reagent (see Du 
etal., 1996). The significant advances in our knowledge of their 
carbohydrate structures, protein backbone sequences, and vari- 
ability in Yariv binding has considerably complicated how an AGP 
is defined. For instance, the diversity of protein backbones has 
led to a classification of the AGPs into different sub-classes based 
on the presence/absence of particular motifs/domains (Johnson 
etal., 2003). The carbohydrate moiety is typically in the form of 
type II arabinogalactans (AGs) although some AGPs also con- 
tain short arabino-oligosaccharide chains (Figure 1; Fincher et al, 
1983; Tan et al, 2004, 2010; Ellis et al, 2010). Type II AGs have also 
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FIGURE 1 I Schematic representation of the diversity of AGP 
glycan structures (taken from Ellis etal., 2010). (A) The "wattle 
blossom" model of the structure of AGPs with a GPI-membrane anchor 
attached (modified from Fincher etal., 1983). (B)The "twisted-hairy 
rope" model of the structure of the Gum Arabic glycoprotein (GAGP; 



(M r - 12,000 ; dp = 75); R = Qr ° r "Ar : R ' = d P >4 



from Qi etal., 1991). (C) Primary structure of a representative 
Hyp-AG polysaccharide (AHP-1) released by base hydrolysis from 
a synthetic AGP (Ala-Hyp) 5 i from tobacco BY2 cells (modified 
from Tan etal., 2004). (D) Larch AG structure (modified from 

Ponder and Richards, 1997). 



been reported either as free polysaccharides (Ponder and Richards, 
1997) or as side chains of rhamnogalacturonan I (RG-I; Caffall 
and Mohnen, 2009). The existence of different forms of type II 
AGs raises a few questions. Are free type II AGs generated from 
AGPs by hydrolases in the wall or synthesized de novo 7 . Are AG 



side chains of RG-I derived from either AGPs by transglycosylases 
or from covalently linked RG-I-AGP complexes? To understand 
how this diversity impacts biological function, we face the chal- 
lenge of isolating "individual" AGPs and sequencing their glycans 
(and protein backbones). 
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Another aspect of AGP research is the intriguing possibility 
that they are one form of covalent cross linker for wall matrix 
phase polysaccharides. In the early 1970s, Keegstra etal. (1973) 
hypothesized that Rha residues on AG side chains of AGPs might 
be attachment sites for RG-I. Since then, AGPs/AGs have been 
reported to form complexes with pectins (Yamada etal, 1987; 
Saulnier et al., 1988; Kwan and Morvan, 1991; Pellerin et al., 1995; 
Yamada, 2000; Duan etal, 2003, 2004) and xylans (Kwan and 
Morvan, 1995). However, residues involved in the covalent cross- 
link between AG(P)s and wall polysaccharides have not been 
defined. Several major challenges must be addressed to determine 
if AGP polysaccharide complexes (APCs) exist and to determine 
the structure and function of any such complexes. 

CHALLENGE 1: ISOLATION AND PURIFICATION OF AGPs 

The incredible heterogeneity of AGP structures has hampered 
purification of individual AGPs. As a consequence, most stud- 
ies on AGPs have been on a family of molecules and often in the 
presence of contaminating polymers. There are a few examples of 
AGPs purified by a combination of traditional chromatographic 
methods (for example, anion exchange/lectin affinity/gel perme- 
ation using chaotropic reagents) and/or Yariv precipitation that 
are "pure" AGPs; for example, the AGPs from tobacco floral tissues 
(Gane etal., 1995) and larch AG exudates (Ponder and Richards, 
1997). The application of molecular biology techniques to both 
isolate heterologously expressed AGP protein backbones or syn- 
thetic peptides as green fluorescent protein tagged (GFP)-fusion 
proteins by the Kieliszewski/Showalter, Matsuoka, and Somerville 
laboratories (Shpak etal., 1999; Zhao etal, 2002; Tan etal, 2004; 
Shimizu etal., 2005; Estevez etal, 2006) was an ingenious inno- 
vation that allowed the purification of AGPs with a single protein 
backbone and therefore the study of inherent glycan heterogene- 
ity. However, the low DP of these glycans raises some questions 
of the fidelity of glycosylation in heterologous/high expression 
systems. 

Thus, a combination of purification techniques is necessary to 
purify relatively homogenous AGPs (andAPC complexes extracted 
as described below). These techniques take advantage of the het- 
erogeneous structural features of AGPs and wall polysaccharides 
including size, charge, hydrophobicity (Serpe and Nothnagel, 
1996; Lamport et al., 201 1), the ability to co-precipitate with Yariv 
reagents, the availability of anti-AG antibodies (Pattathil etal, 
2010), and the use of tagged heterologously expressed protein 
backbones. 

CHALLENGE 2: EXTRACTION AND PURIFICATION OF 
PUTATIVE APCs FROM WALLS 

Because pectins and non-cellulosic polysaccharides are embedded 
within the highly cross-linked wall, the first obstacle to study- 
ing putative APCs is to extract intact macromolecules from the 
wall, especially from secondary walls. Traditional methods to 
release polysaccharides from the wall include either the use of 
wall-specific degrading enzymes (York etal., 1986) or the extrac- 
tion of walls with increasingly harsh solvents (Fry, 1988). Since the 
enzymatic and strong base treatments could also potentially break 
covalent linkages between AGPs and wall polysaccharides, the 
released polymers may only contain partial structural information 



of potential APCs and may still contain contaminating wall 
polysaccharides. 

To avoid these extraction complications it may be possible to 
source APCs from potentially rich sources such as suspension cul- 
ture media, especially of xylogenic calli, polysaccharide-rich seed 
mucilages, and exudates, such as gums (Defaye and Wong, 1986) 
and root mucilages (Moody et al, 1988) since these are released in 
a "solubilized" form. 

CHALLENGE 3: SEQUENCING OF AGs AND APCs 

Our current knowledge of AG carbohydrate sequences are based 
on experiments using tools that include monosaccharide com- 
position, linkage analysis, chemical or enzymatic degradation of 
glycans, mass spectrometry (MS), and NMR analysis. Partial acid 
hydrolysis (Defaye and Wong, 1986), acetolysis, alkaline degra- 
dation, and Smith degradation (Churms etal., 1981; Bade etal., 
1987) have supported the basic structures summarized in Figure 1 
and led to the suggestion that the AG glycans contain a backbone 
of |3 -( 1 ,3 ) -galacto-oligosaccharides interrupted at regular intervals 
with a periodate-sensitive residue. However, few of the large AG 
chains have been de novo-sequenced due to the inherent biosyn- 
thetic heterogeneity and the current limitations of sequencing 
technologies. 

The availability of linkage-specific enzymes has greatly assisted 
the sequencing of glycans although their lack of commercial 
accessibility has hampered progress. Thus a breakthrough in 
AGP analysis was the identification of an AGP-specific exo- 
P-(l,3)-galactanase that can bypass the |3-(l,6)-galactosyl side 
chains (Tsumuraya etal., 1990; Kotake etal, 2005; Ichinose etal, 
2006). This enzyme, together with the recently characterized P- 
glucuronidase (Haque etal., 2005), a-arabinofuranosidase (Hata 
etal., 1992), and endo-P-(l,6)-galactanase form a enzyme tool kit 
specific for AG side-chain analysis which enabled Tryfona et al. 
(2010) to characterize some long P-(l,6)-galacto-oligosaccharide 
AG side chains with the aid of MS/MS fragmentation (see Oxley 
etal., 2004). A recent study of Arabidopsis AGP31 (Hijazi etal, 
2012), a chimeric AGP, illustrates the power of a multipronged 
approach to purification and characterization of AGPs. 

Therefore, the best solution is to sequence small structural units 
of AGs, generated using a combination of chemical and enzymic 
techniques, and then to re-construct models of the intact AGs. 
Discovery of new chemicals and enzymes that can selectively cleave 
AGs would facilitate future progress in the sequencing of the AG 
glycan chains. 

BIOINFORMATICS 

Genomics and its related technologies have revolutionized the 
study of biology, facilitated the development of other 'omics 
platforms, and created a need for bioinformatics to handle the 
acquisition, storage, and analysis of the vast amount of data gen- 
erated from 'omics and 'omics-related projects. The AGP field has 
greatly benefited from genomics and bioinformatics. Given that 
AGP protein backbone sequences often have low sequence sim- 
ilarity, BLAST-type searches typically identify only a few closely 
related AGP family members and, therefore, are not a particularly 
effective means to comprehensively identify members of the AGP 
family. In contrast, bioinformatics approaches have provided a 
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broader and more complete picture of AGP gene/protein families. 
Schultz etal. (2002) conducted the first comprehensive bioinfor- 
matics analysis to identify and characterize AGP genes/proteins 
from the Arabidopsis genome/proteome with respect to their pro- 
tein backbones. This study was refined by Showalter etal. (2010), 
who found 85 AGPs, including 22 classical AGPs, 3 lysine-rich 
AGPs, 16 AG-peptides, 21 fasciclin-like AGPs (FLAs), 17 plasto- 
cyanin AGPs, and 6 other chimeric AGPs. Ma and Zhao (2010) 
have conducted the only other comprehensive bioinformatics 
analysis for AGP protein backbones in rice. They identified 69 rice 
AGP protein backbones from the rice genome/proteome, includ- 
ing 13 classical AGPs, 15 AG-peptides, 3 non-classical AGPs, 3 
early nodulin-like AGPs (eNod-like AGPs), 8 non-specific lipid 
transfer protein-like AGPs (nsLTP-like AGPs), and 27 FLAs. A few 
other bioinformatic studies are reported for AGP protein back- 
bones, but these studies were not focused exclusively on AGPs 
and/or concentrated only on one particular sub-class (e.g., GPI- 
anchored AGPs or FLAs). For example, Borner et al. (2002, 2003) 
used bioinformatics to identify GPI-anchored proteins in Ara- 
bidopsis from genomic and proteomic data. In addition, Irshad 
etal. (2008) applied bioinformatic analysis to their cell wall pro- 
teomic data in Arabidopsis to identify several AGPs and Faik et al. 
(2006) used bioinformatic analyses to identify 34 wheat and 24 rice 
FLAs. Bioinformatic tools have also been used to provide insight 
to the glycosyltransferases (GTs) involved in the assembly of AGP 
glycans (see Biosynthesis of Glycan Moieties) and in this way 
Bacic and colleagues have proposed that the CAZy GT 31 family 
comprise putative P-(l,3)-GalTs (Qu etal., 2008; Ellis etal, 2010; 
Egelund etal, 2011). 

The comprehensive bioinformatic studies on AGPs also took 
advantage of other related genomic technologies, including 
microarray data to reveal organ-specific expression patterns, 
abiotic- and biotic-regulated expression profiles, and genes which 
are co-expressed. Co-expression analysis has the potential to reveal 
networks of genes that are related to particular aspects of AGP 
biology, including their biosynthesis, interacting partners, and 
physiological functions. These kinds of downstream bioinfor- 
matic analyses are just in their infancy and many bioinformatics 
challenge lie ahead relating to AGPs, as outlined below. 

CHALLENGE 4: IDENTIFYING AND CLASSIFYING AGPs FROM 
OTHER SEQUENCED PLANT GENOMES 

Over 30 plant and algal genomes/proteomes are now known 
(see http://en.wikipedia.org/wiki/List_of_sequenced_eukaryotic_ 
genomes#Algae). It would be useful to apply either the current 
or improved bioinformatics programs to these various datasets. 
Suggested enhancements to the programs would include mak- 
ing the bioinformatic analysis more automated and integrating 
the programs for predicting signal peptides, GPI anchor addi- 
tion sites, gene expression, co-expression analysis, etc. into a 
single program. In addition, based on existing protein sequence 
and carbohydrate data on AGPs, a bioinformatics program pre- 
dicting sites of prolyl hydroxylation and corresponding sites and 
type of glycosylation (i.e., AGs and arabino-oligosaccharides) 
could be developed and used. This relies on our knowledge 
that the types of O-glycosylation on the AGP protein back- 
bone can be predicted from the Hyp-contiguity hypothesis that 



defines Hyp (arabino)galactosylation as occurring on the clus- 
tered, non-contiguous Hyp residues separated by Ala or Ser 
residues in a protein backbone whereas blocks of contiguous Hyp 
residues, such as occur in extensins, are arabinosylated with short 
oligosaccharides (Kieliszewski and Lamport, 1994; Shpak etal, 
1999; Goodrum etal, 2000; Zhao etal, 2002); N-glycosylation, 
is predicted by the universally conserved consensus amino acid 
sequence Asn-X-Ser/Thr, where X can be any amino acid except 
for Pro. Similarly, the specificity of prolyl hydroxylation by 
prolyl-4-hydroxylase, although not as well defined in plants as 
in mammalian systems (Gorres and Raines, 2010), can be used 
together with the Hyp-contiguity hypothesis to inform design of 
bioinformatics programs. 

CHALLENGE 5: APPLYING AND IMPROVING BIOINFORMATIC ANALYSES 
OF MICROARRAY DATA TO ELUCIDATE PATTERNS 
0FAGP(C0-)EXPRESSI0N 

Unfortunately, not all of the sequenced plant genomes have exten- 
sive publically available microarray data, unlike Arabidopsis and 
rice (e.g., see PLEXdb, http://www.plexdb.org/). Thus, in addi- 
tion to generating new microarray data, it would be convenient 
to utilize and integrate expression analysis programs like Gen- 
evestigator and co-expression analyzer tools (see Table 1) to mine 
data and provide it in a more tailored manner. Analysis of such 
data can provide remarkable insight into the function (and func- 
tional redundancy) of AGP protein backbone genes as well as 
elucidate networks of AGPs and AGP-related genes involved in 
various metabolic pathways. 

CHALLENGE 6: IMPROVING AND DEVELOPING NEW BIOINFORMATICS 
PROGRAMS TO ELUCIDATE MOLECULAR PHYL0GENIES OF AGP 
PROTEIN BACKBONE GENES 

It would be interesting from an evolutionary standpoint to under- 
stand how AGPs are related within and between species, since such 
analysis may explain how the AGP gene family evolved and pro- 
vide insight into AGP function. From a functional perspective, it 
would be useful to be able to identify AGP gene orthologs and 
paralogs. Software developers would use the gene families identi- 
fied in Challenge 4 through packages summarized in Table 1 and 
the extensive web-based resources developed for studying gene 
ontology to focus on the AGP protein backbone genes. 

CHALLENGE 7: DEVELOP BIOINFORMATICS TOOLS TO IDENTIFY AND 
CLASSIFY GENES/PROTEINS INVOLVED WITH AGP METABOLISM 

Bioinformatic tools to identify genes involved with the biosyn- 
thesis and possible modification and degradation of AGPs would 
be of great benefit. In particular, bioinformatics analysis has the 
potential to identify GTs likely to be involved in the biosynthesis 
of AG chains. Currently, sequence similarities to mammalian GTs 
represent one approach to identifying these enzymes, for example, 
as recently described by Egelund et al. (20 1 1 ) in which the authors 
adopted a bioinformatic approach to identify and systematically 
characterize putative GalTs from CAZy GT-family-31 responsi- 
ble for synthesizing the p-( 1,3)-Gal linkage. This study revealed 
that the Arabidopsis accessions grouped into four plant-specific 
clades (1, 7, 10, and 11; Table 2). Furthermore, the investiga- 
tors attempted to predict the possible substrate specificity of these 
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Table 1 I Bioinformatic programs used to identify and characterize AGPs. 



Program 


Program use 


Web address 


PAST psrcGntsgG CBlculstor 


Identification of AGP backbones 


http://www.adelaide.edu.au/directory/carolyn .schultz (under Files) 


BIO OHIO 


Identification of AGP backbones and more 


http://code.google.eom/p/prot-class/ 


SignalP 


Identification of signal peptides 


http://www.cbs.dtu.dk/services/SignalP/ 


Plant big-PI predictor 


Identification of GPI anchor addition sites 


http://mendel.imp.ac.at/gpi/plant_server.html 


Genevestigator 


Identification of gene expression 


https://www.genevestigator.ethz.ch/ 


Arabidopsis Co-Response Database 


Identification of co-expressed genes 


http://csbdb.mpimp-golm.mpg.de/csbdb/dbcor/ath.html 



Table 2 | The 31 putative GalTs from the Arabidopsis thaliana CAZy 
GT-family-31 and their proposed function. 



Sub-clade 


Accession 


Proposed function 


1 


At1g33250, At5g12460, At4g15240, 


p-(1,3)-GalTs; substrate 




At4g00300, At1g01570, At4g11350, 


unknown! 




At2g37730, At3g11420, At1g05280, 






At4g23490, At1g07850 




7 


At1g27120, At1g74800, At5g62620, 


p-(1.3)-GalTs involved in 




At3g06440, At1g26810, At4g21060 


the synthesis of N- and 






O-glycans, and p-(1,3)- 






GlcNAcTs; substrate 






unknown! 


10 


At4g32120, At1g11730, At1g22015, 


(5-GalTs involved in the 




At5g53340, At2g25300, At1g77810, 


synthesis of AGPs 




At2g32430, At1g05170, At1g53290, 






At2g26100, At3g14960, At1g33430, 






At4g26940, At1g32930 




11 


At5g57500 


p-(1.3)-GalTs; substrate 






unknown! 



The accessions clustered into four plant-specific sub-groups according to Egelund 
etal. (2011). Substrate specificity were predicted based on secondary structure 
and conserved motifs shared with known fi-(1,3)-GalTs (Qu etal., 2008). 



GalTs based on secondary structure and conserved motifs shared 
with known P-( 1 ,3) -GalTs (Qu et al., 2008) . These predictions have 
formed the basis for detailed biochemical and molecular studies to 
define the precise substrate specificities of GT-31 family members. 

In a similar manner co-expression analysis of either selected 
AGP or groups of AGP protein backbones provides another, 
largely unexplored, option to identify candidate GTs respon- 
sible for AG biosynthesis (and/or degradation in murd). This 
idea is based on the premise that once the gene encoding the 
AGP protein backbone is expressed, other genes needed for 
AGP biosynthesis should also be co-expressed. In addition, co- 
expression analysis in conjunction with computational prediction 
of sub-cellular location and known protein-protein interaction 
data of candidate proteins involved in AGP biosynthesis could 



be used to identify proteins that function together in a complex 
(Mostafavi etal, 2008; http://www.genemania.org). Such infor- 
mation could be integrated into an "interactome" focusing on AGP 
biosynthesis. 

CHALLENGE 8: DEVELOP BIOINFORMATICS TOOLS TO IDENTIFY 
REGULATORY SEQUENCES IN AGP PROTEIN BACKBONE GENES 

Bioinformatics has the potential to reveal gene regulatory 
sequences involved in regulated expression of AGP genes with 
respect to developmental expression (e.g., tissue- and temporal- 
specific expression) and a variety of stresses. Bioinformatic pro- 
grams that have the ability to recognize either conserved nucleotide 
patterns alone or in combination with chromatin immunopre- 
cipitation (ChIP) assays followed by DNA sequencing have the 
potential to reveal AGP gene regulatory sequences and the cor- 
responding trans-acting factors. Knowledge of such regulatory 
sequences would reveal commonly regulated networks of AGP 
genes as well as other co-regulated genes. As such, this information 
may be complementary to co-expression data and would provide 
another avenue to elucidating AGP function(s). 

BIOSYNTHESIS OF GLYCAN MOIETIES 

Many mammalian, fungal, and bacterial GTs have been iden- 
tified, cloned, and biochemically characterized (Cantarel etal., 
2009; Ellis etal, 2010). In contrast, only a few plant cell wall 
polysaccharide/proteoglycan-related GTs have been characterized 
biochemically (Doblin etal., 2010). From studies of Arabidop- 
sis at the molecular and biochemical level (Strasser etal, 2007; 
Qu etal., 2008), and from assembly of mammalian proteogly- 
cans, it is expected that AG glycan chains that decorate AGPs 
are synthesized by type II membrane-bound GTs located in the 
Golgi apparatus. This includes members of CAZy GT-family- 
31 with putative f5-(l,3)-GalT activity, that are suggested to be 
involved in synthesis of the f$-(l,3)-Gal backbone in AG glycans 
(Qu et al, 2008; Egelund et al, 201 1). 

Early studies showed that the Golgi apparatus plays an impor- 
tant role in synthesis of fi-(l,6)-Gal of the AG glycan chains of 
AGPs (Mascara and Fincher, 1982; Schibeci etal, 1984), whereas 
the initial enzyme in the AG biosynthetic pathway, adding the 
first Gal residue to a Hyp residue on the protein backbone (the 
Hyp-O-galactosyltransferase or HGT), is predominantly located 
in the ER (Oka et al, 2010). Outside of the development of in vitro 
assays to monitor GalT activity (Qu et al, 2008; Liang et al, 2010; 
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Oka et al., 2010), no significant progress on biochemical character- 
ization of GalTs involved in synthesis of the AG glycans on AGPs 
has been made since the mid-1980s, severely restricting our under- 
standing of AGP biology and potential industrial/pharmaceutical 
applications. It is reasonable to assume that for assembly of 
AG chains, several GalTs will be required, such as HGT(s), f5- 
(l,3)-GalTs, and fi-(l,6)-GalTs, and these enzymes will work 
co-ordinately to regulate the density, length, and sequence of 
the galactan chain. In addition, several GTs responsible for dec- 
orating termini of AG chains, i.e., arabinosyltransferases (AraTs), 
rhamnosyltransferases (RhaTs), fucosyltransferases (FUTs), and 
glucuronosyltransferases (GlcATs) are also involved. Recently, 
AtFUT4 and AtFUT6, two members of CAZy GT-family-37, 
were characterized as Golgi located a( 1,2) FUTs and are the first 
enzymes demonstrated to have a specific function in AGP glyco- 
sylation (Wu et al., 2010). To ensure continued momentum in the 
field, we suggest a focused co-ordinated approach on three core 
challenges: 

CHALLENGE 9: AN ALTERNATIVE APPROACH FOR THE 
IDENTIFICATION OF THE GLYC0SYLATI0N MACHINERY 
INVOLVED IN AG CHAIN SYNTHESIS 

An alternative approach to the one described in Challenge 7, cen- 
ters on the analysis of Gum Arabic, a tree exudate from the Acacia 
species, whose main fraction is an AG (Defaye and Wong, 1986; 
Randall etal., 1989; Al-Assaf etal., 2005). AG chains comprise as 
much as 90-98% of the gum exudate (Osman etal., 1993), thus 
making Gum Arabic-producing cells from the Acacia trees an obvi- 
ous choice as starting material to identify enzymes involved in AG 
biosynthesis. 

CHALLENGE 10: BIOSYNTHESIS OF PUTATIVE APCs 

The challenge is to determine in which sub-cellular compartment 
putative APCs are assembled and by what mechanism? One possi- 
bility is that APCs are synthesized intracellularly in the ER/Golgi 
apparatus by multiple GTs (as proposed for AGPs and other 
non-cellulosic polysaccharides) by either en bloc transfer of pre- 
assembled oligosaccharides or stepwise sugar addition, followed by 
delivery into the wall. Another possibility is that APCs are assem- 
bled in the extracellular matrix, possibly by transglycosylases, a 
mechanism that has been well studied in xyloglucan remodeling 
within the wall (Rose etal., 2002) and is commonly utilized by 
yeast to modify their wall in response to abiotic/biotic stimuli 
(Kollaretal., 1997). 

CHALLENGE 11: HETEROLOGOUS EXPRESSION SYSTEMS 

Expression of non-cellulosic/cellulosic plant GTs in functional 
assay systems remains a key challenge. The past lack of success 
of this approach has been ascribed to the mismatch between 
biochemical assays and native activity, failure of the expressed 
protein to accumulate to sufficient levels, incorrect folding or 
improper post-translational modifications (Petersen etal, 2009). 
The most obvious choice would be to develop an "in planta" sys- 
tem, however, the endogenous GT activities can make it difficult to 
distinguish the specific activity of the expressed protein (Petersen 
et al, 2009). Prokaryotes, of which some have limited capacity for 
post-translational processing, pose other problems. We therefore 



suggest developing multiple heterologous expression systems to 
maximize the likelihood that at least one will allow for successful 
expression where the biochemical activity is retained. Addition- 
ally, testing new expression systems that may prove "universal" 
(e.g., Aspergillus), which has served as one of the preferred expres- 
sion systems in the biotechnology industry, as well as cell-free 
expression systems may prove useful for heterologous expression 
of plant GTs. 

CHALLENGE 12: A HIGH-THROUGHPUT ENZYME ACTIVITY 
SCREENING SYSTEM 

The assignment of substrate specificity to GTs is often hindered 
by difficulties related to limited availability of relevant candi- 
date acceptor molecules for biochemical assays. To overcome 
this challenge the next step should be to employ carbohydrate 
array technology (Moller etal., 2007) with AGP/Gum Arabic- 
specific sugars and peptides, related acceptor substrates, i.e., 
natural acceptors from Gum Arabic and AGPs [e.g., |3-(1,3)- 
galacto-oligosaccharides, generated by Smith degradation (see 
Challenge 3), de-arabinosylated AGPs generated by mild acid, 
chemically synthesized |3-(1,3)-Gal oligosaccharides and isolated 
AGP protein backbones] together with other "AGP-enriched" frac- 
tions from wild type, AGP GT mutants, and Gum Arabic exudates. 

Combining AGP-related arrays with established in vitro assays 
will facilitate a high-throughput screening system that can be 
used to test heterologously expressed candidate GTs in mixtures 
with either radio-labeled or fluorescently tagged NDP-sugar as the 
donor to identify AGP-specific carbohydrate acceptor molecules 
on the array. Development of such a comprehensive screening 
system would be a significant step in identifying the many GTs 
responsible for AG biosynthesis. 

FUNCTION 

Arabinogalactan-protein glycan-specific antibodies and P-Glc 
Yariv reagent have been broadly used to investigate AGP activ- 
ity in tissue culture and in planta (Seifert and Roberts, 2007; 
Ellis etal., 2010). The current use of these two indirect tools 
continues to provide information on AGP activity in new bio- 
logical systems, e.g., European larch, Larix decidua (Rafinska and 
Bednarska, 2011), and little studied developmental processes, in 
this case, ovule development in gymnosperms, confirming the 
relevance and the conservation of function of these molecules 
within the plant kingdom. Unfortunately, the broad specificity of 
these techniques makes it impossible to assign function to a single 
AGP. This limitation has been partially overcome by genetic and 
molecular studies, including the characterization of AGP single or 
double mutants, RNAi and over-expressing lines, although these 
approaches also have complications. 

The usefulness of reverse genetics approaches to investigate 
AGP backbone function is well demonstrated. The function of 
one cotton FLA, GhAGP24, in cotton fiber initiation and elonga- 
tion (Li et al, 2010a) and four Arabidopsis members, FLA1, FLA3, 
FLA11, and FLA12 have recently been published (Li et al., 2010a,b; 
MacMillan et al, 2010; Johnson et al., 201 1). Roles for FLA1 in lat- 
eral root and shoot development in tissue culture prior to cell-type 
specification (Johnson et al., 201 1) and FLA3 in microspore devel- 
opment, possibly by participation in cellulose deposition within 
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the intine (Li etal., 2010b), have been described. FLAs 11 and 
12 have also been implicated in the process of cellulose deposi- 
tion, contributing to plant stem strength and elasticity by affecting 
cell wall integrity (MacMillan etal., 2010). Such a function is 
consistent with an earlier report by Shi etal. (2003) implicating 
FLA4/SOS5 in maintaining proper cell expansion under salt- 
stressed conditions. The apparent diversity of FLA function may 
be due to the ability of FLAs to mediate protein-protein interac- 
tions with cell wall or plasma membrane-associated ligands via 
their fasciclin-like domains, shown in other eukaryotic systems to 
facilitate cell adhesion. 

In addition to FLA3, AGP6, and AGP11, two classical AGPs 
specifically expressed in pollen, have been demonstrated to be 
involved in the control of timing of pollen germination, as pollen 
of the agp6 agpl 1 double mutant germinates precociously inside 
the anthers (Coimbra et al, 2010). How the presence of AGP6 and 
AGP 11 avoids precocious pollen germination is unknown, but it 
may occur by regulating water uptake. 

The Lys-rich AGP sub-family has been the focus of several 
studies in tomato and Arabidopsis. Functional characterization of 
AtAGP18, one of the three Lys-rich AGPs, by over-expression of 
the genomic sequence in Arabidopsis indicates that AGP 18 plays a 
role in vegetative growth and sexual reproduction (Acosta-Garcia 
and Vielle-Calzada, 2004; Zhang etal., 2011a). The bushy phe- 
notype resembles that of tomato lines over-expressing LeAGPl 
and is similar to tobacco plants over-producing cytokinins (Zhang 
etal., 2011b) leading to the suggestion that it may participate 
in a cytokinin signal transduction pathway as a co-receptor of 
cytokinins. A similar model has been proposed for FLA4/SOS5 
in its interactions with two members of the leucine-rich repeat 
receptor-like kinase family, FEI1 and FEI2, shown by double 
mutant analyses to have non-additive genetic interactions with 
FLA4/SOS5 (Xu etal, 2008). SOS5 has been hypothesized to 
act as the ligand of a signal molecule that then either binds 
directly to FEI1/FEI2 or assists in presenting the signal molecule 
to FEI1/FEI2, initiating a signaling cascade that regulates the 
synthesis of cellulose and ultimately cell growth. 

Several questions arise from this ligand model of AGP func- 
tion. Given the effects on cellulose in the flail flal2 double null 
mutant (MacMillan et al., 2010) and the abnormal cellulose depo- 
sition fla3 RNAi lines (Li etal, 2010b), may FLA11, FLA12, and 
FLA3, as well as other GPI-anchored or non-anchored AGPs, also 
be a part of this same network of components involved in wall 
sensing? Does this model explain the observation of AGPs as cell 
fate markers in tissues undergoing cell differentiation? Consider- 
ing that the appearance of AGPs during specific developmental 
stages has been described using antibodies that recognize AGP- 
carbohydrate epitopes, is the heterogeneity of AGP glycosylation 
also involved in providing the necessary specificity to interact with 
different signal molecules and generate specific responses? What 
is the relevance of the presence and number of fasciclin domains 
of FLAs? Further investigation of the possible function of AGPs in 
wall sensing is of fundamental importance to uncover some of the 
components and mechanisms involved in the regulation of wall 
biosynthesis and ultimately plant cell growth. To address some of 
these challenges, we propose the use of the following experimental 
approaches, techniques, and resources: 



CHALLENGE 13: TARGETING FUNCTIONAL REDUNDANCY OF AGPs 

The application of multiple gene knock-down technologies such 
as double-stranded RNAi or artificial micro-RNAs could allow 
the silencing of putative redundant genes within the different 
AGP protein backbone subfamilies and therefore overcome the 
problems associated with functional redundancy. The detection of 
specific expression patterns and changes in transcript levels of AGP 
protein backbone genes have also assisted in directing the applica- 
tion of targeted experimental approaches to reveal their function, 
highlighting the importance of the availability and analysis of 
transcriptional data. 

In addition, the use of co-expression gene network analyses 
to identify genes possibly related with AGP function, including 
those implicated in environmental sensing and signal transduc- 
tion, would help to deepen our knowledge of the relationship, if 
any, between AGPs and the regulation of wall growth and integrity. 
The characterization of the promoters of AGP genes specifically 
expressed in pollen is generating detailed information of the tissue 
and spatiotemporal location of AGP transcripts that will allow the 
implementation of more targeted experimental approaches to test 
the function of pollen AGPs (Anand and Tyagi, 2010; Choi etal., 
2010; Yang etal., 2011). However, when using transcriptional data 
as a guide to study gene function, we should be aware that in some 
cases mRNA levels have not been in agreement with protein levels 
(Yang etal., 2011). 

CHALLENGE 14: PRODUCTION OF SPECIFIC AGP PROTEIN 
BACKBONE ANTIBODIES 

The recent production of antibodies specifically recognizing the 
Lys-rich region of AtAGP17 and AtAGP19 protein backbones 
demonstrates both the veracity of this approach and also provides 
tools to study in more detail their tissue and cellular distribution 
and ultimately their function (Yang et al., 201 1). Either these anti- 
bodies or alternatively antibodies to tagged versions of AGPs could 
be used in co-location and immunoprecipitation experiments to 
identify possible interacting partners. 

CHALLENGE 15: DETERMINING THE FUNCTIONAL SIGNIFICANCE 
OF AG GLYCAN CHAIN HETEROGENEITY 

One approach to address the functional importance of the gly- 
can moiety of AGPs is to characterize AGP-specific GT mutants. 
Mutants implicated in AGP glycan moiety biosynthesis by tran- 
script co-expression analysis could also by analyzed as single 
mutants and in combination with other GTs to potentially increase 
plant phenotypic severity. Limiting analyses to either single cell 
types (e.g., pollen/pollen tubes), or simple tissues with limited 
cell-types, would help in these analyses and provide a more 
restricted list of candidate GT genes. These genes could then 
be heterologously co-expressed and cellular fractions used in 
biochemical assays for functional assessment. While the ini- 
tial aim of this work is to identify the GTs involved in AGP 
glycan synthesis, the underlying objective is to use these and 
other AGP mutants as functional assay systems to dissect the 
mechanism and pathway of AGP synthesis in greater detail. 
Such mutants are only useful as a means to manipulate AGPs 
if a visible and/or measurable or assayable AGP phenotype is 
observed. 
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CONCLUSION 

In this brief overview we have attempted to summarize what we 
believe to be the major challenges facing the research community 
in attempting to unravel the structure, function, and biosynthesis 
of AGPs and to provide some indicators on how we might progress. 
In addition, we believe there is much to be learnt from advances 
our colleagues in the microbial, fungal, and mammalian proteo- 
glycan fields have made and encourage our colleagues to embrace 
these findings as a guide to advancing AGP research. 
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